An Analysis of the Available Data
Figure 1B from Study
Figure 1B from Data
Individual Change in Plaque Volume
(B) The red line represents the median change (0.8%), and the shaded area represents the IQR (0.3%-1.7%).
Figure 1A from Study
Figure 1A from Data
Individual Change in Plaque Volume
(A). The red line represents the median change (18.9 mm3), and the shaded area represents the IQR (9.3-47.0 mm3).
Figure 2F from Study
Figure 2F from Data
Changes in Total Plaque Score vs Coronary Artery Calcium
(C, F) Only CAC is associated with changes in NCPV and TPS. The regression line was fitted with the function “lm,” which regresses y~x, and the shaded area represents the standard error.
4 Simple Linear Regression Assumptions
3 are tested with data
Linearity: between the predictor and the outcome
Constant variance (homoscedasticity) of residuals
Normally distributed residuals
These linear assumptions are quantifiable and objectively testable.
“all linear model assumptions were corroborated with the R function
performance::check_model.”- Direct quote from the study
Actual Assumption Tests
| Model | β | Linearity | Constant Variance | Residual Normality |
|---|---|---|---|---|
| ΔNCPV ~ CACbl | β = 0.18 p = <0.001 |
Violation p = 0.031 |
Violation p = 0.001 |
Violation p = <0.001 |
| ΔNCPV ~ NCPVbl | β = 0.25 p = <0.001 |
OK p = 0.198 |
Violation p = <0.001 |
Violation p = <0.001 |
| ΔNCPV ~ PAVbl | β = 5.48 p = <0.001 |
Borderline p = 0.050 |
Violation p = <0.001 |
Violation p = <0.001 |
| ΔNCPV ~ TPSbl | β = 7.37 p = <0.001 |
OK p = 0.132 |
Violation p = <0.001 |
Violation p = 0.001 |
Objective tests show all 4 models failed at least 2 tests.
- quote from letter to the editor
- quote from the reply to a letter to the editor
Calling residual-plot evaluation “subjective” is misleading.
Visual checks are interpretive, but these linear assumptions are quantifiable and objectively testable.
“CONCLUSIONS In lean metabolically healthy people on KD, neither total exposure nor changes in baseline levels of ApoB and LDL-C were associated with changes in plaque.”
| Abstract claim component | Model | Model reported |
|---|---|---|
| Δ-plaque vs LDL-C exposure | Δ-NCPV ~ LDL-C exposure | Not reported |
| Δ-plaque vs LDL-C baseline | Δ-NCPV ~ LDL-C baseline | Not reported |
| Δ-plaque vs ΔLDL-C | Δ-NCPV ~ ΔLDL-C | Not reported |
| Δ-plaque vs ApoB exposure | Δ-NCPV ~ ApoB exposure | Not reported |
| Δ-plaque vs ApoB baseline | Δ-NCPV ~ ΔApoB | Reported |
| Δ-plaque vs ΔApoB | Δ-NCPV ~ ΔApoB | Reported |
| N/A | NCPV_final ~ LDL-C exposure | Reported (NCPV_final, PAV_final) |
“Results Neither change in ApoB …, baseline ApoB, nor total LDL-C exposure … were associated with the change in noncalcified plaque volume (NCPV) or TPS.”
“Neither … change in ApoB nor the ApoB level … were associated … with TPS (Figures 2D and 2E, Table 3).” - “changes in and baseline levels of ApoB were not associated with changes in NCPV or TPS”
Figures 2D–2F are Δ-TPS (outcome) panels (vs ΔApoB, ApoB, CAC_bl)
Table 3 has no Δ-TPS models
No Δ-TPS ~ LDL-C exposure models or results anywhere.
“LDL-C exposure on a KD was calculated by sum- ming the products of the reported days on a KD prior to study commencement and baseline LDL-C on a KD plus the study follow-up days by their final LDL-C.”
\[ \text{LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \]
“LDL-C exposure on a KD was calculated by summing the products of the reported days on a KD prior to study commencement and baseline LDL-C on a KD plus the study follow-up days by their final LDL-C.”
\[ \text{LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \]
“Estimated lifelong LDL-C additionally included the product of age upon commencing a KD and pre-KD LDL-C.”
\[ \text{Life-LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \;+\; \boldsymbol{\big( Age_{\text{at-KD-start}}\cdot LDL_{\text{pre-KD}} \big)} \]
“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate (Table 3). Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model (Table 3).”
This is not a “mediation” analysis. A mediation analysis:
This isn’t mediation analysis
“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate (Table 3). Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model (Table 3).”
They ran (reported) three regressions in sequence.
Conclusion: after adjusting for baseline CAC, neither age nor lifetime LDL-C predicts NCPV_final.
Changes in p-values come from collinearity due to Age being embedded in lifetime LDL-C exposure.
Age is a confounder/proxy for exposure duration, not a mediator.
They did NOT report NCPV_final ~ lifetime LDL-C exposure model results.
“Sensitivity analyses on participants with >80% of bHB measurements above 0.3 mmol/L (Supplemental Tables 2 to 4) and with high calculated 10-year cardiovascular risk showed similar results to those just reported (Supplemental Table 5).”
This is not a sensitivity analysis. This is a subgroup analysis.
Sensitivity analysis: demonstrate robustness of conclusions to reasonable alternative assumptions or analytic choices
Subgroup analysis: assess differences of effects across subsets of the dataset (e.g., high vs low adherence; low vs high baseline risk).
Properly, this is tested with interaction terms in the full sample, not by splitting the data, running the exact same models and eyeballing p-values.
“Bayes factors were calculated…with default settings and an
~ rscale value of 0.8to contrast a moderately informative prior with a conservative distribution width (to allow for potential large effect sizes) due to the well-documented association between ApoB changes and coronary plaque changes.”
Calling 0.8 “moderately informative” is inaccurate.
R package docs: “medium”, “wide”, “ultrawide” = 0.354, 0.5, 0.707
rscaleCont = 0.8 is wider than “ultrawide” → a very diffuse prior that places substantial mass on very large effectsFixed description:
“We used a very wide prior on coefficients (rscale = 0.8, wider than the package’s”ultrawide”), which places substantial prior mass on very large effects. This diffuse prior penalizes small-to-moderate effects, requiring substantially stronger evidence than under the ‘wide’ or ‘medium’ defaults to support them.”
Title is now: Longitudinal Data From the KETO-CTA Study Plaque Predicts Plaque, ApoB Does Not
Find and replace ‘begets’ with ‘predicts’.
“Most participants presented with stable NCPV (Figures 1A and 1B), with 1 participant exhibiting a decrease in NCPV”
That interpretation of Figures 1A and 1B was removed and replaced with:
“The median change in NCPV was 18.9 mm3 (IQR: 9.3-47.0 mm3) and the median change in PAV was 0.8% (IQR: 0.3%-1.7%).”
Table 1 median (Q1–Q3) PAV at baseline changed from 1.25% (0.5–3.6) in the preprint to 1.6% (0.5–4.9
The two violations you keep seeing—non-normality and heteroskedasticity—are largely driven by the outcome’s distribution (ΔNCPV) and its mean–variance pattern. Swapping predictors (e.g., APOB vs CAC) usually won’t fix those. So it’s likely most univariable Δ models would show the same two problems.
The simple linear model breaks two key rules: the residuals aren’t normally distributed and their spread changes with x (heteroskedasticity).
The estimated slope (the “trend”) can still be a good average summary of how y changes with x.
But the usual p-values/confidence intervals from ordinary least squares (OLS) can’t be trusted because the standard error formula is wrong under heteroskedasticity, and non-normality hurts small-sample tests. If your OLS p-value < 0.05: treat it as suggestive, not definitive. Recompute using heteroskedasticity-robust or bootstrap methods. It may stay significant—or it may not. If your OLS p-value ≥ 0.05: you can’t conclude “no association.” The test might be too noisy or mis-calibrated. Recheck with robust/bootstrapped standard errors and report the effect size with a confidence interval.
the conventional OLS standard errors, t-tests, and CIs are invalid with heteroskedasticity; non-normality further invalidates small-sample t-inference.
In plain terms The model breaks three core assumptions: residuals aren’t normal, their spread changes with x, and the relationship isn’t actually linear. Because of the changing spread, the usual p-values/intervals are mis-calibrated (they can be too small or too big). Because the relationship isn’t linear, the reported “slope” isn’t a clear effect; it’s just a weighted average of a curved pattern. Its size—or even its sign—may not reflect the true relationship. If the reported OLS p-value < 0.05: “This suggests a non-zero average linear trend, but inference is mis-calibrated and the effect has no clear meaning under a misspecified (nonlinear) model. The ‘significance’ may be an artifact.” If the reported OLS p-value ≥ 0.05: “This is a non-result from a mis-calibrated, misspecified model. It does not justify concluding ‘no association,’ and it may mask real patterns due to the nonlinear form and changing variance.”
Interpretation of reported results: p < 0.05: Evidence only for a non-zero projected linear component under misspecification; inference is mis-sized and the estimand lacks a clear causal/functional meaning. p ≥ 0.05: Absence of evidence from a misspecified, mis-sized test; it does not speak to the presence or absence of a true association.
Assumptions of a linear model (and why they matter) Linearity of the mean: the average outcome changes in a straight-line way with the predictor. Why it matters: If the true pattern is curved, the slope summarizes the wrong thing and can misstate direction and size.
Independence of errors: observations don’t carry leftover information about each other (no autocorrelation/clustering). Why it matters: Dependence makes uncertainty estimates too small or too large.
Constant variance (homoskedasticity): the scatter of errors is roughly the same across the predictor. Why it matters: If the spread grows or shrinks, standard errors and p-values from the basic model are mis-calibrated.
Approximately normal errors (mainly for small samples): error terms are roughly bell-shaped. Why it matters: The usual t-tests and confidence intervals rely on this; strong departures undermine those calculations.
Exogeneity / no systematic bias: on average, errors are unrelated to the predictor (no omitted confounders correlated with x). Why it matters: Violations bias the slope itself, not just its uncertainty.
No exact collinearity (relevant in multivariable settings): predictors aren’t exact copies of each other. Why it matters: Otherwise the model can’t isolate individual effects. (Not an issue in a single-predictor model.)
The study’s univariable ‘ΔNCPV ~ APOB’ analysis is not decisive. The appropriate test is APOB’s partial association in a follow-up model that controls for baseline NCPV (and age/sex). Only H 0 : β APOB = 0 H 0
:β APOB
=0 in follow-up ~ baseline + APOB + covariates addresses whether APOB is associated with follow-up independent of baseline.
The reported null does not address whether APOB is associated with follow-up conditional on baseline, age, and sex, which is the clinically relevant estimand.